ISSS608: AY2021-22(T2): DataViz Makeover 1 Data Preparation Log

This page contains all data preparation steps needed to reproduce the Tableau Visualisation in DataViz Makeover 1

Melissa Tan (SMU MITB Analytics Track)https://scis.smu.edu.sg/master-it-business/analytics-track/curriculum?gclid=CjwKCAiA3L6PBhBvEiwAINlJ9EJwYxpaZv-zPxR0UMntDh37TrlWU7jwXP9Dcu9jvWvN8uEJsOWzTRoCqrQQAvD_BwE
2022-02-13

Data Cleaning on Tableau Prep

Step 1:

Download the dataset Resident Labour Force Participation Rate by Age and Sex from the Ministry of Manpower Singapore’s website.

Open the file on Excel and visually inspect the data. There are 2 worksheets in the file. Since the original chart segmented ages in levels of 5 years, the second mrsd_Res_LFPR_2 worksheet was selected for use in the makeover. The dataset contains Resident Labour Force Participation Rate by Age and Sex from 1991 to 2021.

Step 2:

Create 3 copies of mrsd_Res_LFPR_2 in the Excel file.

For each copy, keep only 1 subset of the original data(i.e. Total, Male and Female respectively) and remove the rest.

Step 3

Create a new flow by opening Tableau Prep Builder and load the Excel file. Drag and drop the mrsd_Res_LFPR_2(Total) worksheet into the main pane and initiate the data cleaning feature by clicking on the “+” button and selecting “Clean Step”.

Step 4:

On the lower pane,we can see that there are unusable null values. To remove these, click on the button with the three dots, followed by filter, and Null Values. Select the ‘Keep only Non-Null Values’ option of the dialog box

Step 5:

Next, filter and remove the 20 irrelevant Year columns (1991-2009 inclusive) by selecting columns F2 to F21, clicking on the button with the three dots, and selecting Remove.

Step 6:

Correctly name the remaining Year columns F22-F22 by choosing the Rename Field option and double clicking on the default column name to edit.

Then, exclude the row containing the years as values. Only the LPFR values for each age category will remain.

Step 7:

Create a pivot table by clicking on the + circle in the flow diagram. A new Pivot icon should appear in the flow. In the resultant lower frame, Select all the Year Fields from the left-most pane and drag it to the adjacent pane titled Pivoted Fields. Ensure that the option columns to Rows is selected. Rename the resulting column LFPR (Total) and all other pivoted field names. The pivot table is ready.

Step 8:

Send the cleaned data into an output to allow it to be used in Tableau Desktop. Click on the + circle and select Output. A new Output icon should appear in the flow diagram. In the lower frame, choose the appropriate output type (csv in this case), and a new file name before clicking Run Flow.

Step 9:

Repeat Steps 3 to 8 for the mrsd_Res_LFPR_2(F) and mrsd_Res_LFPR_2(M) Tables

Step 10:

As the difference between the LPFR of the 2 genders for any given year are needed in the visualization, a new calculated field has to be derived. Drag and hover one of the cleaned pivot table from either the mrsd_Res_LFPR_2(F) or mrsd_Res_LFPR_2(M) flows to the other and activate a Join action. A new Join icon will appear linking the 2 data tables.

Step 11:

Set the join clauses by linking the respective Year and Age fields and choosing a full Join. Then click on Create Calculated Field and specify the formula for deriving the new values ([LFPR(M)]-[LFPR(F)]). A new Field column will be formed.

Step 12:

Finally, for the data visualisation showing the aggregate difference between the LFPR of Males and Females, create a new data subset in a new excel worksheet by extracting the Males and Females row from mrsd_Res_LFPR_2. Create a formula to calculate the difference between the values in each year.

Import the new worksheet and data table into Tableau Prep. Remove the 3 highlighted rows (Female, Male and Year) by using Exclude, retain on the Diff row.

The final set of flow diagrams should look like this:

Data Visualisation on Tableau Desktop

Step 14:

Create a new connection in Tableau to load all the prepared data files. Drag and drop the 5 highlighted csv files into the main frame. Create relationships between them by linking the Year fields of each fhile to one another. Ensure the Extract radio button is checked.

Chart 1 - Overall LFPR over time, segmented by Age

Step 15:

Create a new worksheet and set up the axes of Chart 1 by dragging the Year and LFPR(Total) fields in the left hand menu into the Columns and Row Shelves respectively. To view the data by age segments, drag the Age field into the Color and Label boxes for the default colours and labels to show.

Step 16:

To exclude selected Age segments, drag the Age field into the Filters box. Select the field to be filtered out (70 & over) and Check the Exclude box.

Step 17:

To change the default colours of the line graphs, click on the top right hand button on the Legend card on the right of the frame. Select Edit Colours and from the palette, change to the appropriate colors as needed.

Step 18:

To change the axis range and title, open the Edit Axis dialog box by right-clicking at the horizontal axis. Choose a fixed range (2010-2022) and remove the title since the axis value are self-explanatory.

Step 19:

Edit the Chart title by double clicking on the top of the pane. Use matching mark colours to associate the observations in the write-up to the lines on the chart.

Step 20:

Add a reference line to indicate the occurence of COVID-19 by right clicking on the horizontal axis and choosing Add Reference Line. Click on the Value drop down menu and create a new Parameter COVID-19 with a current value of 2020. Select Computation as the Label to display ‘COVID-19’.

Step 21:

To edit the labels of the Age filter, Click on the Age blue pill in the side bar and select Edit Aliases from the menu. Double click on any cell in the Value (Alias) column and input the new label.

Step 22:

The final Chart 1 should look like this:

Chart 2 - Changes in LFPR with Age

This chart shows how the LFPR changes across the working lifespan of residents, as well as the spread for each demographic group over the years.

Step 23:

Create a new worksheet and set up the axes of Chart 2 by dragging the Age and LFPR(Total) fields in the left hand menu into the Columns and Row Shelves respectively. To view the data by age segments, drag the Year field into the Color box. In the Analytics tab on the top left hand corner of the sidebar, add an Average Line for each cell. Make the lines orange so it stands out and display the values under each line.

Add the title and insights, matching the text colours with the graph labels for easier reference. The final chart should look like this.

Chart 3 - Older Ages’ LFPR have a wider spread

This chart investigates in greater detail how the LFPR of older workers changed over time.

Step 24:

Create a duplicate of Chart 2. Filter and retain the age groups from 55 and above. Select Box Plot from the Analytics tab and add a plot for each cell.

Add annotations to flag out the increase in LFPR for the 65-69 age group. Add title and insights, matching the text colours with the graph labels for easier reference. The final chart should look like this.

Chart 4 - Overall LFPR by Gender

This chart show how the LPFR of Males and Females changed over time.

Step 25:

Create a new worksheet and set up the axes of Chart 4 by dragging the Year field to the Column Shelf and both LFPR(F) and LFPR(M) fields to the Row Shelf. Select Dual Axis to over the 2 lines on the same axes and select Synchronise Axes to allow the removal of the y-axis on the right hand side.

Drag the Age field to the Detail box. Filter and retain on the aggregate values i.e. Females and Males.

Step 26:

At the sidebar Marks menu, change the chart type from Automatic to Area for both lines. Change the colours to show blue for Males and pink for Females.

Step 27:

Since we are using this chart to zoom into the gender difference, adjust the y-axis to a fixed range from 50 to 80 so that we can take a close-up look. Show the mark labels for both Male and Female.

Your final Chart 4 should look like this:

Chart 5 - Overall LFPR Gender Gap

This chart show how the gap between the LPFR of Males and Females changed over time.

Step 28:

Create a new worksheet and set up the axes of Chart 5 by dragging the Year(AggDiff) field to the Column Shelf and the LFPR(AggDiff) field to the Row Shelf. from the Colour box, change the Effects of the markers to show the points at each year.

Step 29:

Add a reference line to show the starting year value (20.0%) and include annotations to explain how the gap was calculated. Add title and insights.

Your final Chart 5 should look like this:

Chart 6 - Age-segmented LFPR Gender Gap

This chart show how the gap between the LPFR of Males and Females differed across the demographic profiles

Step 30:

Create a new worksheet and set up the axes of Chart 6 by dragging the Year field to the Column Shelf and the Gender Diff field to the Row Shelf. Drag Age field into the Colour box

Step 31:

Change the colours of the different age groups by highlighting the interesting lines only (20-24, 25-29, 70-74, 75 & Over) while greying out the rest.

Select the 2021 data points of these 4 highlighted lines and Show Label.

Step 32:

Use matching colours in the insights text to reference the relevant age groups’ chart lines.

The final Chart 6 should look like this:

Chart 7 - Zoomed in view of Gender Gap for Young Adults (20-24)

This chart shows a close up of how the LFPR gap changed for the 20-24 age groups over time.

Step 33:

Create a new worksheet and set up the axes of Chart 7 by dragging the Year field to the Column Shelf and the 2 Gender Diff fields to the Row Shelf. Filter and retain only the 20-24 age group.

Select Line for the first Mark and Area of the second Mark and Select Dual Axis to overlay the 2 plots. This allows the area chart to have a more prominent outline.

Under Colour, choose Blue and lower the opacity of the shading to 20%

Step 34:

Repeat step 28 but this time make the area under the line pink.

Step 35:

Combine the 2 plots by adjusting the upper plot y-axis to display from 0 to 8 and the lower plot y-axis to display from -8 to 0. This will make the areas above zero show blue, representing Males and the areas below zero show pink, representing Females. Add annotation and headers.

The final Chart 7 should look like this:

Chart 8 - Zoomed in view of Gender Gap for Young Adults (24-29)

This chart shows a close up of how the LFPR gap changed for the 24-29 age groups over time.

Step 36:

Repeat Steps 28 to 30 but use the 24-29 Age filter instead.

Chart 8 should look similar, like this:

Creating Dashboards

This is to arrange the charts in a viewer-friendly layout that also tells the story.

Step 37:

Create a new Dashboard worksheet on Tableau. Drag the first sheet LFPR(Overall) to the top of the canvas and drag next 2 to the lower half of the canvas. Remove the Legend from the Dashboard and include an umbrella header by dragging a blank Text box from the Objects menu in the left side bar to the desired location at the top.

Step 38:

Create another Dashboard worksheet on Tableau. Drag the remaining sheets to each of the 4 quarters of the canvas. For the last quarter on the bottom right, adjust the 2 charts so that they are the same size.

Include an umbrella header by dragging a blank Text box from the Objects menu in the left side bar to the desired location at the top.

The final Dashboard should look like this:

FINAL VISUALISATION

The final visualisation should look like this: